Exploratory Data Analysis





Kerry Back

Data

SQL

  • select … from …
  • join … on …
  • where …
  • order by …

Connect with python

from sqlalchemy import create_engine
import pymssql
import pandas as pd

server = "mssql-82792-0.cloudclusters.net:16272"
username = "user"
password = "" # paste password between quote marks
database = "ghz"
string = "mssql+pymssql://" + username + ":" + password + "@" + server + "/" + database
conn = create_engine(string).connect()

Example: ROEQ and mom12m in 2021-12

data = pd.read_sql(
    """
    select ticker, date, ret, roeq, mom12m
    from data
    where date='2021-12'
    """, 
    conn
)
data = data.dropna()

data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2399 entries, 0 to 2406
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   ticker  2399 non-null   object 
 1   date    2399 non-null   object 
 2   ret     2399 non-null   float64
 3   roeq    2399 non-null   float64
 4   mom12m  2399 non-null   float64
dtypes: float64(3), object(2)
memory usage: 112.5+ KB

First few rows

data.head(3)
ticker date ret roeq mom12m
0 JJSF 2021-12 0.161173 0.035539 0.030303
1 ELA 2021-12 0.000000 0.109925 -0.054113
2 PLXS 2021-12 0.139648 0.027229 0.168786

Summary statistics

data[["roeq", "mom12m", "ret"]].describe().round(3)
roeq mom12m ret
count 2399.000 2399.000 2399.000
mean -0.011 0.399 0.023
std 1.899 0.909 0.142
min -83.912 -0.833 -0.580
25% -0.007 0.021 -0.023
50% 0.028 0.256 0.036
75% 0.055 0.539 0.085
max 22.041 21.256 3.338

Regressions

Binned regressions

  • Sort into groups based on explanatory variable
    • Percentiles of explanatory variable
    • Deciles of explanatory variable
  • Compute means of explanatory and dependent variable in each group.
  • When dependent variable = ret, mean is return of equally weighted portfolio.
  • Regress dependent variable means on explanatory variable means.

Portfolio returns by group

  • Average predictor values may not be important.
  • We really want to know whether stocks with high values beat stocks with low values.
  • Put the order of the predictor group (1 through 100 or 1 through 10) on the x-axis.